So welcome everybody. Today Jonas Ozt will present a paper by Jake Snell and colleagues
and it's entitled prototypical networks for Q-shot learning. Jonas, the stage is yours.
Alright, thank you for the kind introduction and let's start. So this talk is divided into
three major parts. At first we will have a gentle introduction into Q-shot learning with
the most important terminology, but the main part of the talk is of course the concept
of prototypical networks. Among the mathematical derivations and the core algorithms we will
see different interpretations of prototypical networks and hear about major design choices
as well as zero-shot learning. The third part of this talk will be about the performed experiments
on three major datasets for meta-learning. The results are also compared with state-of-the-art
meta-learning algorithms. Alright, so let's start with an introduction to Q-shot learning.
Let's have a look at the most important terminology of Q-shot learning. Some of the terms were
already mentioned in last week's talk by Aka and Benjamin, but I think it's good to revise
them. So what is Q-shot classification? In Q-shot classification, the classifier needs
to classify classes at this test time which were not seen at training time. To do so,
few samples of these unseen classes are provided. We'll hear more about these classes in a second.
When we talk about Q-shot classification, we usually specify how many classes we want
to classify and how many support samples we provide for these classes. The number of classes
is called way and the number of samples is called shot. So a usual terminology would
be k-way and shot classification. There are two special cases which usually gain attention.
This would be the one-shot classification where we have only one support sample and
the case where we have no support sample is called zero-shot classification or learning.
We can do actually one-shot learning easily by ourselves. Here we see an example. I think
the one with the segway is pretty easily, but on the right you also see some characters
from and for us unknown alphabet. Given the one example in the box, you can now guess
which of these characters belong to the same class. So find the remaining characters. It's
actually in the second row, the third column, and in the fourth row, the second column.
So we're working with two sets as previously mentioned, the support set and the query set.
Our support set comes with labels and the amount of support samples is denoted as n
and is determined by the number of shots as I introduced it in the previous slide. Each
sample in the support set is represented by a d-dimensional feature vector and of course
a scalar label. The number of classes in the support set is given by the ways, again as
I mentioned in the previous slide. The second set is the query set and these are the samples
we want to classify and at test time we do not have a label for these. The classes in
the test set and the query set and the support set are the same. And since we are performing
our predictions on the query set, these are also the samples which we use for computation
of the loss during training and accuracy during testing.
Alright so now let's look about the prototypical networks and the actual model idea. To start
with we look at each of our support samples and put them into an embedding function denoted
here as f phi. This maps our d-dimensional feature vector in the n-dimensional embedding
space. Here in the embedding space we sum over all embeddings of a class and divide
by the number of samples per class and what this is, this is simply the mean of the class
which we will call prototype. Now that we have computed class prototypes from our support
set we want to use those for computing class probabilities for our query samples. For that
we need a distance function like for example the square Euclidean distance. Now we put
the sample, the query sample also into the embedding function to bring it into the embedding
space. Here we simply evaluate the distance to all of the class prototypes and now we
want to obtain probabilities and we can do that with a very well-known formula. At least
I hope you know it and this is this formula, the softmax function. So we simply implement
our negative distance into the softmax function and obtain the probability for our classes.
We are training this by minimizing the negative log likelihood for our true class as it is
Presenters
Zugänglich über
Offener Zugang
Dauer
00:33:52 Min
Aufnahmedatum
2020-11-23
Hochgeladen am
2020-11-23 14:50:15
Sprache
en-US
Today Jonas Utz presents the paper "Prototypical Networks for Few-shot Learning"
We propose Prototypical Networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend Prototypical Networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset.